Semi-Markov Clustering
نویسنده
چکیده
Clustering of subsequences of time series is a widely studied and applied class of techniques [3, 4, 2, 6, 1, 5, 7] which aims at interpreting a time series as a shorter sequence of symbols, each of which represent a segment of the series with a pattern that is similar to other segments that has been assigned the same symbol. The research field of subsequence clustering, which was already a widely applied and studied technique, took a dramatic turn after Keogh, Lin and Truppel [4] published a paper which claimed that clustering subsequences using sliding window approaches is meaningless. They had discovered that techniques such as the standard k-means clustering of the segments of a time series created by a sliding window of a fixed size are often looking the same regardless of the input data. When using a fixed size sliding window one will often create segments that are in different phases with respect to the pattern in the data which makes it difficult to learn meaningful patterns. Here we extend the objective function of k-means clustering from fixed window segments to segmentations of varying length. In this new clustering framework there is no need for using overlapping windows. Semi-markov clustering does not cluster a fixed set of segments, but works with all possible segmentations to find one way of segmenting the sequence into consistent patterns. Unlike in the fixed window setting, the result is a segmentation into consistent patterns and centroids that look exactly like those patterns. We report results on several dataset including the artificial Cylinder-Bell-Funnel set, ECG and accelerometer traces from body worn sensors.
منابع مشابه
Systemic Risk Evaluation of Banks and financial institutions applying Markov clustering method and centrality measures of risk
Systemic risk is the risk beared by an economic system because of a special organization. This means that a liquidity problem or a financial crisis in one company could trigger a chain of reactions that puts the whole market into trouble. This kind of risk was underestimated until 2008 financial crisis. Now federal regulations exist for controlling this risk of financial institutions. Among div...
متن کاملClustering Heterogeneous Data with Mutual Semi-supervision
We propose a new methodology for clustering data comprising multiple domains or parts, in such a way that the separate domains mutually supervise each other within a semi-supervised learning framework. Unlike existing uses of semi-supervised learning, our methodology does not assume the presence of labels from part of the data, but rather, each of the different domains of the data separately un...
متن کاملMicrosoft Word - Hybridmodel2.dot
Today’s state-of-the-art speech recognition systems typically use continuous density hidden Markov models with mixture of Gaussian distributions. Such speech recognition systems have problems; they require too much memory to run, and are too slow for large vocabulary applications. Two approaches are proposed for the design of compact acoustic models, namely, subspace distribution clustering hid...
متن کاملSemi-supervised Clustering using Combinatorial MRFs
A combinatorial random variable is a discrete random variable defined over a combinatorial set (e.g., a power set of a given set). In this paper we introduce combinatorial Markov random fields (Comrafs), which are Markov random fields where some of the nodes are combinatorial random variables. We argue that Comrafs are powerful models for unsupervised learning by showing their relationship with...
متن کاملProbabilistic Semi-Supervised Clustering with Constraints
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn th...
متن کاملA Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EMstyle clustering algorithm, the E-step of which requires collective assignment of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009